dog food
What Happens When Your Coworkers Are AI Agents
In this episode of, we talk to writer Evan Ratliff about how he created a small startup made entirely of AI employees--and what his findings reveal about the reality of an agentic future. This year, AI agents have been at the forefront of tech companies' ambitions. OpenAI's Sam Altman has often talked about a possible billion-dollar company being spun up with just one human and an army of AI agents. And so last summer, journalist Evan Ratliff decided to try to become that unicorn himself--by creating HarumoAI, a small startup that's made up of AI employees and executives. Hosts Michael Calore and Lauren Goode sit down with Evan to discuss how it's going, and the current promises and realities of AI agents. Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Hey, Lauren, how are you doing? It was so fantastic that I had a hard time coming back, honestly. And I saw a lot of really beautiful art. Not a bad place to go for vacation, I have to say. I've heard this before, I confirmed it. And after seeing so much incredible art and just people doing stuff with their hands and tangible goods, I was like, I don't want to go back to the world of AI. I didn't want to go back to sitting in a coffee shop and hearing everyone pitching their AI startups and driving on the 101 and seeing the billboards. I was just like, What? No, keep me in the land of Burrata and Caravaggio. Well, Lauren, I'm sorry to tell you that you came back on the show just in time to talk about AI agents. It's something that we've talked about a lot this year and our listeners have heard about it a lot, and we're not sick of talking about it.
- North America > United States > Louisiana (0.04)
- Africa > Ethiopia (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (5 more...)
- Media (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Security & Privacy (0.68)
- (4 more...)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability
He, Yunzhen, Takase, Yusuke, Ishibashi, Yoichi, Shimodaira, Hidetoshi
Large Language Models (LLMs) are increasingly being used in real-world applications. However, concerns about the reliability of the content they generate persist, as it frequently deviates from factual correctness or exhibits deficiencies in logical reasoning. This paper proposes a novel decoding strategy aimed at enhancing both factual accuracy and inferential reasoning without requiring any modifications to the architecture or pre-trained parameters of LLMs. Our approach adjusts next-token probabilities by analyzing the trajectory of logits from lower to higher layers in Transformers and applying linear regression. We find that this Decoding by Logit Trajectory-based approach (DeLTa) effectively reinforces factuality and reasoning while mitigating incorrect generation. Experiments on TruthfulQA demonstrate that DeLTa attains up to a 4.9% improvement over the baseline. Furthermore, it enhances performance by up to 8.1% on StrategyQA and 7.3% on GSM8K, both of which demand strong reasoning capabilities.
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- North America > United States > Washington (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Amin, Kareem, Babakniya, Sara, Bie, Alex, Kong, Weiwei, Syed, Umar, Vassilvitskii, Sergei
Synthetically-generated data plays an increasingly larger role in training large language models. However, while synthetic data has been found to be useful, studies have also shown that without proper curation it can cause LLM performance to plateau, or even "collapse", after many training iterations. In this paper, we formalize this question and develop a theoretical framework to investigate how much curation is needed in order to ensure that LLM performance continually improves. We find that the requirements are nearly minimal. We describe a training procedure that converges to an optimal LLM even if almost all of the non-synthetic training data is of poor quality. Our analysis is inspired by boosting, a classic machine learning technique that leverages a very weak learning algorithm to produce an arbitrarily good classifier. Our training procedure subsumes many recently proposed methods for training LLMs on synthetic data, and thus our analysis sheds light on why they are successful, and also suggests opportunities for future improvement. We present experiments that validate our theory, and show that dynamically focusing labeling resources on the most challenging examples -- in much the same way that boosting focuses the efforts of the weak learner -- leads to improved performance.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- (5 more...)
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries
Kim, Junhyuck, Park, Jongho, Cho, Jaewoong, Papailiopoulos, Dimitris
We introduce Lexico, a novel KV cache compression method that leverages sparse coding with a universal dictionary. Our key finding is that key-value cache in modern LLMs can be accurately approximated using sparse linear combination from a small, input-agnostic dictionary of 4k atoms, enabling efficient compression across different input prompts, tasks and models. Using orthogonal matching pursuit for sparse approximation, Lexico achieves flexible compression ratios through direct sparsity control. Lexico maintains 90-95% of the original performance while using only 15-25% of the full KV-cache memory, outperforming both quantization and token eviction methods. Notably, Lexico remains effective in low memory regimes where 2-bit quantization fails, achieving up to 1.7 better compression on LongBench and GSM8K while maintaining high accuracy. Figure 1: Memory usage vs. performance of Lexico compared to other key-value (KV) cache compression methods on GSM8K. The figure illustrates the relationship between KV cache size and the performance of Lexico on Llama models on GSM8K 5-shot evaluation. Lexico consistently outperforms both eviction-based methods (SnapKV, PyramidKV) and quantization-based methods (per-token quantization, KIVI, ZipCache). Transformers (Vaswani et al., 2017) have become the backbone of frontier Large Language Models (LLMs), driving progress in domains beyond natural language processing. However, Transformers are typically limited by their significant memory requirements. This stems not only from the large number of model parameters, but also from the having to maintain the KV cache that grows proportional to the model size (i.e., the number of layers, heads, and also embedding dimension) and token length of the input.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Mississippi > Madison County > Madison (0.04)
S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
Yan, Yuchen, Jiang, Jin, Liu, Yang, Cao, Yixin, Xu, Xin, zhang, Mengdi, Cai, Xunliang, Shao, Jian
Self-correction is a novel method that can stimulate the potential reasoning abilities of large language models (LLMs). It involves detecting and correcting errors during the inference process when LLMs solve reasoning problems. However, recent works do not regard self-correction as a spontaneous and intrinsic capability of LLMs. Instead, such correction is achieved through post-hoc generation, external knowledge introduction, multi-model collaboration, and similar techniques. In this paper, we propose a series of mathematical LLMs called S$^3$c-Math, which are able to perform Spontaneous Step-level Self-correction for Mathematical reasoning. This capability helps LLMs to recognize whether their ongoing inference tends to contain errors and simultaneously correct these errors to produce a more reliable response. We proposed a method, which employs a step-level sampling approach to construct step-wise self-correction data for achieving such ability. Additionally, we implement a training strategy that uses above constructed data to equip LLMs with spontaneous step-level self-correction capacities. Our data and methods have been demonstrated to be effective across various foundation LLMs, consistently showing significant progress in evaluations on GSM8K, MATH, and other mathematical benchmarks. To the best of our knowledge, we are the first to introduce the spontaneous step-level self-correction ability of LLMs in mathematical reasoning.
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Singapore (0.04)
- (2 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
Investigating the Robustness of LLMs on Math Word Problems
Anantheswaran, Ujjwala, Gupta, Himanshu, Scaria, Kevin, Verma, Shreyas, Baral, Chitta, Mishra, Swaroop
Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, ProbleMATHIC, containing both adversarial and non-adversarial MWPs. Our experiments reveal that LLMs are susceptible to distraction by numerical noise, resulting in an average relative performance drop of ~26% on adversarial MWPs. To mitigate this, we fine-tune LLMs (Llama-2, Mistral) on the adversarial samples from our dataset. Fine-tuning on adversarial training instances improves performance on adversarial MWPs by ~8%, indicating increased robustness to noise and better ability to identify relevant data for reasoning. Finally, to assess the generalizability of our prompting framework, we introduce GSM-8K-Adv, an adversarial variant of the GSM-8K benchmark. LLMs continue to struggle when faced with adversarial information, reducing performance by up to ~6%.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (5 more...)
Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity
Vector embeddings have become ubiquitous tools for many language-related tasks. A leading embedding model is OpenAI's text-ada-002 which can embed approximately 6,000 words into a 1,536-dimensional vector. While powerful, text-ada-002 is not open source and is only available via API. We trained a simple neural network to convert open-source 768-dimensional MPNet embeddings into text-ada-002 embeddings. We compiled a subset of 50,000 online food reviews. We calculated MPNet and text-ada-002 embeddings for each review and trained a simple neural network to for 75 epochs. The neural network was designed to predict the corresponding text-ada-002 embedding for a given MPNET embedding. Our model achieved an average cosine similarity of 0.932 on 10,000 unseen reviews in our held-out test dataset. We manually assessed the quality of our predicted embeddings for vector search over text-ada-002-embedded reviews. While not as good as real text-ada-002 embeddings, predicted embeddings were able to retrieve highly relevant reviews. Our final model, Vec2Vec, is lightweight (<80 MB) and fast. Future steps include training a neural network with a more sophisticated architecture and a larger dataset of paired embeddings to achieve greater performance. The ability to convert between and align embedding spaces may be helpful for interoperability, limiting dependence on proprietary models, protecting data privacy, reducing costs, and offline operations.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
Feeding your dog peas could lead to canine heart disease, study finds
Feeding your dog the remains of your dinner can seem like a harmless thing to do, but a new study warns peas could increase their risk of getting heart disease. Scientists in Massachusetts have found a link between canine consumption of peas and the development of canine dilated cardiomyopathy (DCM) – an often fatal condition that causes a dog's heart muscle to enlarge. As the heart dilates and becomes larger, it becomes harder to pump, which can lead to heart valve leaks or a build-up of fluids in the chest. Worrying, peas and other legumes including lentils and chickpeas have been ingredients in some'grain-free' dog foods for years – and could be responsible for hundreds of dog deaths. 'Grain-free' dog foods containing legumes instead of grain have already been investigated by the US Food and Drug Administration (FDA).
- North America > United States > Massachusetts (0.25)
- Oceania > New Zealand (0.05)
- Oceania > Australia (0.05)
- (2 more...)
Announcing The Launch of Helio
CircleUp exists to help promising entrepreneurs raise capital and sophisticated investors invest in breakthrough brands with confidence and efficiency on both sides. Since we opened our doors in 2012 we've been challenging private equity norms by revolutionizing manual company sourcing and vetting with a technology-driven private marketplace. The ability to standardize and extract deep insights from large data sets has long been core to CircleUp's approach. Early last year, we introduced The Classifier, which analyzes each company that applies to CircleUp based on an average of 90,000 data points. Helio proactively collects billions of data points on over 1.2 million consumer and retail companies in the U.S. to analyze the relative strength of each company across key metrics.
- North America > United States > California (0.15)
- North America > United States > Texas (0.05)
- North America > United States > New York (0.05)
- North America > United States > Colorado (0.05)